<!doctype html>
<html>
<head>
	<meta charset="utf-8" />
	<title>CWS中文分词</title>
	<link href="../../theme/style.css" rel="stylesheet" type="text/css" />
	<script src="../../theme/jquery.js" type="text/javascript"></script>
	<script src="../../theme/wscui.js" type="text/javascript"></script>
	<script src="../../theme/style.js" type="text/javascript"></script>
</head>
<body>
<div class="module">
	<h1>CWS中文分词</h1>
	<div class="intro">
		全称：Chinese Words Segmentation<br />
		用途：将中文分隔成若干个关键词<br />
		模式：基于词典分法<br />
		示例：句子 "服装和服饰" 将分成 "服装", "和服", "服饰" 共3个词<br />
		组件：<a href="http://cn2.php.net/mbstring" target="_blank">mbstring</a>，<a href="http://redis.com/" target="_blank">redis</a> (可选, 速度大约是MySQL的4倍)
	</div>
	<div class="catalog">
		<div class="no">方法</div>
		<ol>
			<li><a href="/php/module/cws/del.html">del</a> - 删除词汇</li>
			<li><a href="/php/module/cws/search.html">search</a> - 保存词汇</li>
			<li><a href="/php/module/cws/save.html">save</a> - 保存词汇</li>
			<li><a href="/php/module/cws/seg.html">seg</a> - 执行分词</li>
			<li><a href="/php/module/cws/sync.html">sync</a> - 同步词典</li>
		</ol>
	</div>
	<div class="setting">
		<table>
			<thead>
				<tr>
					<th>类型</th>
					<th>名称</th>
					<th>默认</th>
					<th>描述</th>
				</tr>
			</thead>
			<tbody>
				<tr>
					<td>string</td>
					<td>CWS_DICT_CHARSET</td>
					<td>utf-8</td>
					<td>
						可选：utf-8, gbk, big5
					</td>
				</tr>
				<tr>
					<td>string</td>
					<td>CWS_DICT_TYPE</td>
					<td>file</td>
					<td>
						可选：redis, mysql, file支持<br />
						速度：redis(1) > mysql(4) > file(10)
					</td>
				</tr>
				<tr>
					<td>string</td>
					<td>CWS_DICT_TYPE_FILE</td>
					<td>setting/cws/default.dict</td>
					<td>当DICT_TYPE为file时, 指定词典完整路径</td>
				</tr>
				<tr>
					<td>string</td>
					<td>CWS_DICT_TYPE_REDIS</td>
					<td>cwsdict</td>
					<td>当DICT_TYPE为redis时, 指定Redis键名前缀</td>
				</tr>
				<tr>
					<td>int</td>
					<td>CWS_SEG_LEN_MAX</td>
					<td>7</td>
					<td>
						最大词汇长度为几个汉字<br />
						例如："中华人民公共国"是一个词汇
					</td>
				</tr>
				<tr>
					<td>int</td>
					<td>CWS_SEG_LEN_MIN</td>
					<td>2</td>
					<td>
						词汇最小长度为几个汉字<br />
						例如："中国"作为一个词汇, 而"中"和"国"不作为词汇
					</td>
				</tr>
				<tr>
					<td>array</td>
					<td>CWS_SEG_PUNCT</td>
					<td></td>
					<td>分词时，指定的标点、符号、特殊字符等过滤规则</td>
				</tr>
			</tbody>
		</table>
	</div>
	<div class="setting">
		<table>
			<thead>
				<tr>
					<th>类型</th>
					<th>名称</th>
					<th>默认</th>
					<th>描述</th>
				</tr>
			</thead>
			<tbody>
				<tr>
					<td>int</td>
					<td>seg_count_line</td>
					<td>0</td>
					<td>当前分词共切分语句数量</td>
				</tr>
				<tr>
					<td>int</td>
					<td>seg_count_seek</td>
					<td>0</td>
					<td>当前分词共检查词典次数</td>
				</tr>
				<tr>
					<td>int</td>
					<td>seg_count_word</td>
					<td>0</td>
					<td>当前分词共提取到有效词汇数量</td>
				</tr>
			</tbody>
		</table>
	</div>
	
</div>
</body>
</html>